Amazon | Data Engineer | 4 YOE



Round 1: Data Modeling & SQL

🔹Write a SQL query to fetch the top 3 highest-earning employees from each department. 

🔹How would you design a data model for an e-commerce application (customers, products, orders)? 

🔹Explain normalization and denormalization—when would you use each? 

🔹What is a composite primary key, and in which scenario would you use it? 

🔹How do you handle performance tuning in SQL queries? 

🔹How would you implement slowly changing dimensions (SCDs) in a data warehouse?

Round 2: Big Data & Distributed Systems

🔹How would you design a data pipeline to process 1 TB of data daily in real-time? 

🔹Explain the differences between Hadoop, Spark, and Flink. Which one would you choose for real-time data processing and why? 

🔹How do you optimize data storage for large-scale datasets on AWS S3?

🔹Explain partitioning in Hive and how it improves query performance. 

🔹How would you process a huge dataset using AWS Glue or EMR? 

Round 3: Big Data & Distributed Systems

🔹Describe your experience with AWS Redshift and how you optimized query performance. 

🔹How would you architect a scalable ETL pipeline using AWS Lambda and Step Functions? 

🔹Explain how you would handle security and data governance in an AWS data lake setup. 

🔹Discuss your experience with AWS Glue, Redshift, and S3. What are the best practices for optimizing storage and retrieval? 

🔹How would you implement real-time data ingestion and processing using Kinesis or Kafka?

Round 4 - Hiring Manager

🔹 Discussion around my past experience and projects, some resume based questions

🔹 He wanted to know about my good and bad experiences with past employers

🔹 How will you work in a team for tight project delivery timelines?

🔹 What are you expecting in your next job role?